Search CORE

Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation.

Author: Bolleman J.
Bougueleret L.
Breuza L.
Bridge A.
Estreicher A.
Famiglietti M.L.
Gos A.
Géhant S.
Poux S.
Redaschi N.
Xenarios I.
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype

Infoscience - École polytechnique fédérale de Lausanne

The SwissLipids knowledgebase for lipid biology.

Author: Aimo L.
Bougueleret L.
Bridge A.
David F.P.
Gleizes A.
Götz L.
Hyka-Nouspikel N.
Kuznetsov D.
Liechti R.
Niknejad A.
Riezman H.
van der Goot F.G.
Xenarios I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

MOTIVATION: Lipids are a large and diverse group of biological molecules with roles in membrane formation, energy storage and signaling. Cellular lipidomes may contain tens of thousands of structures, a staggering degree of complexity whose significance is not yet fully understood. High-throughput mass spectrometry-based platforms provide a means to study this complexity, but the interpretation of lipidomic data and its integration with prior knowledge of lipid biology suffers from a lack of appropriate tools to manage the data and extract knowledge from it. RESULTS: To facilitate the description and exploration of lipidomic data and its integration with prior biological knowledge, we have developed a knowledge resource for lipids and their biology-SwissLipids. SwissLipids provides curated knowledge of lipid structures and metabolism which is used to generate an in silico library of feasible lipid structures. These are arranged in a hierarchical classification that links mass spectrometry analytical outputs to all possible lipid structures, metabolic reactions and enzymes. SwissLipids provides a reference namespace for lipidomic data publication, data exploration and hypothesis generation. The current version of SwissLipids includes over 244 000 known and theoretically possible lipid structures, over 800 proteins, and curated links to published knowledge from over 620 peer-reviewed publications. We are continually updating the SwissLipids hierarchy with new lipid categories and new expert curated knowledge. AVAILABILITY: SwissLipids is freely available at http://www.swisslipids.org/. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Updates in Rhea-a manually curated resource of biochemical reactions.

Author: Aimo L.
Alcántara R.
Axelsen K.B.
Belda E.
Bougueleret L.
Bridge A.
Coudert E.
Hyka-Nouspikel N.
Lombardot T.
Morgat A.
Niknejad A.
Redaschi N.
Steinbeck C.
Xenarios I.
Zerara M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/10/2014
Field of study

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models

HAMAP in 2015: updates to the protein family classification and annotation system.

Author: Auchincloss A.H.
Baratin D.
Bougueleret L.
Bridge A.
Coudert E.
Cuche B.A.
de Castro E.
Keller G.
Pedruzzi I.
Poux S.
Redaschi N.
Rivoire C.
Xenarios I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm

Updates in Rhea - an expert curated resource of biochemical reactions.

Author: Aimo L.
Axelsen K.B.
Bougueleret L.
Bridge A.
Coudert E.
Hyka-Nouspikel N.
Lombardot T.
Moretti S.
Morgat A.
Niknejad A.
Onwubiko J.
Pagni M.
Pozzato M.
Redaschi N.
Rosanoff S.
Xenarios I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 26/10/2016
Field of study

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of expert-curated biochemical reactions designed for the functional annotation of enzymes and the description of metabolic networks. Rhea describes enzyme-catalyzed reactions covering the IUBMB Enzyme Nomenclature list as well as additional reactions, including spontaneously occurring reactions, using entities from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Here we describe developments in Rhea since our last report in the database issue of Nucleic Acids Research. These include the first implementation of a simple hierarchical classification of reactions, improved coverage of the IUBMB Enzyme Nomenclature list and additional reactions through continuing expert curation, and the development of a new website to serve this improved dataset

OpenFluDB, a database for human and animal influenza virus

Author: A. Bairoch
A. Gleizes
Abed
Abed
Beigel
Belshe
Carr
Conenello
Cox
D. Kuznetsov
Edgar
Ferraris
Gubareva
Gubareva
Gubareva
Hulse
I. Xenarios
Ives
L. Bougueleret
Li
Long
Mishin
Monto
P. Le Mercier
Parvin
Perdue
R. Liechti
Seo
Smith
Stephens
Su
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27 000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch

HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot

Author: A. Bairoch
A. H. Auchincloss
Altschul
Attwood
Bennett
C. Lachaize
C. Rivoire
D. Baratin
E. Coudert
E. de Castro
Edgar
Fleischmann
G. Keller
Gattiker
Haft
I. Phan
K. Michoud
L. Bougueleret
Marcotte
Mons
Notredame
Salzberg
Sanger
Stothard
T. Lima
Thompson
V. Bulliard
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The growth in the number of completely sequenced microbial genomes (bacterial and archaeal) has generated a need for a procedure that provides UniProtKB/Swiss-Prot-quality annotation to as many protein sequences as possible. We have devised a semi-automated system, HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes), that uses manually built annotation templates for protein families to propagate annotation to all members of manually defined protein families, using very strict criteria. The HAMAP system is composed of two databases, the proteome database and the family database, and of an automatic annotation pipeline. The proteome database comprises biological and sequence information for each completely sequenced microbial proteome, and it offers several tools for CDS searches, BLAST options and retrieval of specific sets of proteins. The family database currently comprises more than 1500 manually curated protein families and their annotation templates that are used to annotate proteins that belong to one of the HAMAP families. On the HAMAP website, individual sequences as well as whole genomes can be scanned against all HAMAP families. The system provides warnings for the absence of conserved amino acid residues, unusual sequence length, etc. Thanks to the implementation of HAMAP, more than 200 000 microbial proteins have been fully annotated in UniProtKB/Swiss-Prot (HAMAP website: http://www.expasy.org/sprot/hamap)

Infoscience - École polytechnique fédérale de Lausanne

Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support.

Author: Altenhoff A.M.
Appel R.D.
Arnold K.
Bairoch A.
Bastian F.
Bergmann S.
Bougueleret L.
Bucher P.
Delorenzi M.
Lane L.
Le Mercier P.
Lisacek F.
Michielin O.
Palagi P.M.
Rougemont J.
Schwede T.
Stockinger H.
van Nimwegen E.
von Mering C.
Walther D.
Xenarios I.
Zavolan M.
Zdobnov E.M.
Zoete V.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years

Repository for Publications and Research Data

edoc

ZORA

Collaborative annotation of genes and proteins between UniProtKB/Swiss-Prot and dictyBase

Author: A. Auchincloss
A. Bairoch
A. Bridge
A. Gateau
A. Nikolskaya
B. Roechert
Baldauf
C. Hulo
C. Rivoire
D. Lieberherr
E. Boutet
E. Coudert
E. Stanley
Eichinger
F. Jungo
G. Keller
I. Pedruzzi
J. James
K. Axelsen
K. Sjolander
Krishnamurthy
L. Bougueleret
L. Lane
M. Feuermann
M. Moinat
M. Schneider
M. Tognolli
N. Farriol-Mathis
P. Brown
P. Fey
P. Gaudet
P. Lemercier
R.L. Chisholm
R.S. Datta
S. Braconi Quintaje
S. Duvaud
S. Ferro Rojas
S. Jimenez
S. Poux
T. de Oliveira Lima
U. Hinz
W.C. de Lima
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

UniProtKB/Swiss-Prot, a curated protein database, and dictyBase, the Model Organism Database for Dictyostelium discoideum, have established a collaboration to improve data sharing. One of the major steps in this effort was the ‘Dicty annotation marathon’, a week-long exercise with 30 annotators aimed at achieving a major increase in the number of D. discoideum proteins represented in UniProtKB/Swiss-Prot. The marathon led to the annotation of over 1000 D. discoideum proteins in UniProtKB/Swiss-Prot. Concomitantly, there were a large number of updates in dictyBase concerning gene symbols, protein names and gene models. This exercise demonstrates how UniProtKB/Swiss-Prot can work in very close cooperation with model organism databases and how the annotation of proteins can be accelerated through those collaborations